gradient estimator
Multistage Conditional Compositional Optimization
Şen, Buse, Hu, Yifan, Kuhn, Daniel
We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The naïve nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Switzerland (0.04)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Contextual Preference Distribution Learning
Hudson, Benjamin, Charlin, Laurent, Frejinger, Emma
Decision-making problems often feature uncertainty stemming from heterogeneous and context-dependent human preferences. To address this, we propose a sequential learning-and-optimization pipeline to learn preference distributions and leverage them to solve downstream problems, for example risk-averse formulations. We focus on human choice settings that can be formulated as (integer) linear programs. In such settings, existing inverse optimization and choice modelling methods infer preferences from observed choices but typically produce point estimates or fail to capture contextual shifts, making them unsuitable for risk-averse decision-making. Using a bounded-variance score function gradient estimator, we train a predictive model mapping contextual features to a rich class of parameterizable distributions. This approach yields a maximum likelihood estimate. The model generates scenarios for unseen contexts in the subsequent optimization phase. In a synthetic ridesharing environment, our approach reduces average post-decision surprise by up to 114$\times$ compared to a risk-neutral approach with perfect predictions and up to 25$\times$ compared to leading risk-averse baselines.
- North America > Canada > Quebec > Montreal (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Transportation > Infrastructure & Services (0.48)
- Transportation > Ground > Road (0.34)
Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization
As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations). Our extensive experimental results show that our approaches outperform other state-of-the-art ZO algorithms, and strike a balance between the convergence rate and the function query complexity.
Total stochastic gradient algorithms and applications in reinforcement learning
Backpropagation and the chain rule of derivatives have been prominent; however, the total derivative rule has not enjoyed the same amount of attention. In this work we show how the total derivative rule leads to an intuitive visual framework for creating gradient estimators on graphical models. In particular, previous "policy gradient theorems" are easily derived. We derive new gradient estimators based on density estimation, as well as a likelihood ratio gradient, which "jumps" to an intermediate node, not directly to the objective function. We evaluate our methods on model-based policy gradient algorithms, achieve good performance, and present evidence towards demystifying the success of the popular PILCO algorithm.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Jordan (0.05)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > British Columbia (0.04)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States (0.14)